Dynamic Scheduling with Narrow Operand Values

نویسنده

  • Erika Gunadi
چکیده

Tomasulo’s algorithm creates a dynamic execution order that extracts a high degree of instruction-level parallelism from a sequential program. Modern processors create this schedule early in the pipeline, before operand values have been computed, since present-day cycle-time demands preclude inclusion of a full ALU and bypass network delay in the instruction scheduling loop. Hence, modern schedulers must predict the latency of load instructions, since load latency cannot be determined within the scheduling pipeline. Whenever load latency is mispredicted due to an unanticipated cache miss or store alias, a significant amount of power is wasted due to incorrectly issued dependent instructions that are already traversing the execution pipeline. This paper exploits the prevalence of narrow operand values (i.e. ones with fewer signficant bits) to solve this problem, by placing a fast, narrow ALU and datapath within the scheduling loop. Virtually all load latency mispredictions can be accurately anticipated with this narrow data path, and little power is wasted on executing incorrectly scheduled instructions. We show that such a narrow data-path design, coupled with a novel partitioned store queue and pipelined data cache, can achieve a cycle time comparable to conventional approaches, while dramatically reducing misspeculation, saving power, and improving per-cycle performance. Finally, we show that due to the rarity of misspeculation in our architecture, a less-complex flush-based recovery scheme suffices for high performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Narrow Width Dynamic Scheduling

To satisfy the demand for higher performance, modern processors are designed with a high degree of speculation. While speculation enhances performance, it burns power unnecessarily. The cache, store queue, and load queue are accessed associatively before a matching entry is determined. A significant amount of power is wasted to search entries that are not picked. Modern processors speculatively...

متن کامل

Analysis of Flexible Multiplier Using Razor Based Dynamic Voltage Scaling for Filter Design

In this paper, we present flexible multiprecision multiplier that combined variable precision, parallel processing (PP), razor based dynamic voltage scaling (DVS), and dedicated MP operand scheduling to provide optimum performance for variety of operating conditions. All of the building blocks of proposed flexible multiplier can either work as independent small precision multiplier or parallel ...

متن کامل

Dynamic operand transformation for low-power multiplier-accumulator design

The design of portable battery-operated devices requires low-power computation circuits. This paper presents a new multiplier-accumulator (MAC) design approach, which in contrast to existing methods exploits dynamic operand transformation to reduce power consumption. The key idea is to compare current values of input operands with previous values and depending on computed Hamming distance to us...

متن کامل

Compiling Quantum Programs Using Genetic Algorithms

On many of these technologies, two-qubit gates (or, if you prefer, two-operand instructions) can only have neighboring qubits as operands. When two operands that are not next to each are scheduled to be arguments to an instruction, they must be brought together by swapping qubit values (or variables) with their neighbors until the arguments are next to each other, and the algorithmically specif...

متن کامل

On Availability of Bit-Narrow Operations in General-Purpose Applications

Program instructions that consume and produce small operands can be executed in hardware circuitry of less than full size. We compare different proposed models of accounting for the usefulness of bit-positions in operands, using a run-time profiling tool, both to observe and summarize operand values, and to reconstruct and analyze the program’s data-flow graph to discover useless bits. We find ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002